Large Reasoning Models Struggle with Instruction Adherence, Study Reveals
Together AI's latest research exposes a critical flaw in large reasoning models (LRMs), revealing their inconsistent ability to follow instructions during complex reasoning tasks. The study introduces ReasonIF, a 300-problem benchmark that evaluates multilingual reasoning, formatting constraints, and word limits.
While LRMs demonstrate competence in final outputs, their reasoning processes frequently deviate from specified instructions. This adherence gap widens with task complexity, raising fundamental questions about AI controllability in high-stakes applications.
The findings arrive as AI systems increasingly handle sensitive financial operations, from algorithmic trading to risk assessment. Market participants should note these reliability concerns when implementing LRM-powered solutions in cryptocurrency analytics or automated trading systems.